Persistence and periodicity in a dynamic proximity network* 
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The topology of social networks can be understood as being inherently dynamic, with edges 
having a distinct position in time. Most characterizations of dynamic networks discretize time by 
converting temporal information into a sequence of network "snapshots" for further analysis. Here 
we study a highly resolved data set of a dynamic proximity network of 66 individuals. We show that 
the topology of this network evolves over a very broad distribution of time scales, that its behavior 
is characterized by strong periodicities driven by external calendar cycles, and that the conversion 
of inherently continuous-time data into a sequence of snapshots can produce highly biased estimates 
of network structure. We suggest that dynamic social networks exhibit a natural time scale A na t, 
and that the best conversion of such dynamic data to a discrete sequence of networks is done at this 
natural rate. 



Complex systems of interacting components are now of- 
ten represented as a network, i.e., n nodes or vertices 
joined together in pairs by m links or edges. Character- 
izing these networks' topological patterns can often yield 
significant insights into the structure and function of the 
original system [H-Qj an d networks from a wide vari- 
ety of domains, including social 0-(l], technological 0- 
9| and biological [Iol - [l2j systems, have been studied in 
this way. From this kind of network analysis, it has 
been demonstrated that many real-world networks ex- 
hibit similar properties. For instance, most real-world 
networks exhibit a highly heterogeneous degree distribu- 
tion and short topological distances between arbitrary 
vertices. 

Social networks differ from many other kinds of net- 
works [HI, for instance, by having a larger-than-expected 
density of short loops (typically triangles - the so-called 
"clustering coefficient" although the behavior seems to 
generalize to short loops of all lengths). Further, social 
networks exhibit a pattern in connectivity at the whole- 
network level that is often called "community structure," 
in which large and relatively dense subgraphs are them- 
selves sparsely connected together (T^J, as well as assor- 
tative mixing on various node attributes (l5j . 

Much of our understanding of the large-scale structural 
patterns in social networks has come from the study of 
static topologies - an idealization that naturally omits 
any dynamic or time-evolving character of the social pro- 
cesses that underly these networks. For some scientific 
questions, a static topology may be sufficient. However, 
for questions regarding dynamics, the temporal variation 
of edges themselves is likely to be important. For in- 
stance, in the spread of an epidemic, the order of inter- 
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action can have a significant effect on which individuals 
become infected, and ultimately the size of an epidemic 
outbreak, e.g., if two individuals A and B interact prior 
to A being infected, but not after, then B has no risk 
of infection, a dynamic that is difficult to capture with 
static topologies (Fig. [T]). 

Unfortunately, temporal connectivity data is often dif- 
ficult to obtain for real social systems. The traditional 
approach to studying network dynamics in sociology (see, 
for instance, 16[ and reference therein) tends to rely 
heavily on interaction data self-reported by study par- 
ticipants, which exhibit significant bias and noise 17| . 
Several recent studies in physics and computer science 
have utilized web-based or other indirect sources of dy- 
namic social network data [l8l - l23j . In general, empir- 
ical studies convert the available temporal data into a 
short sequence of non-overlapping network "snapshots," 
(called "panel data" in the sociology literature [2J| ) , each 
of a length of time greater than the natural time scale of 
topological variation. This sequence can then be further 
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FIG. 1: A schematic illustrating the impact of the temporal 
ordering of social interactions for an epidemic process (colored 
nodes) spreading on the network. In contrast to the lower 
panel, in which all three nodes appear to be connected, the 
upper panel shows that interactions that precede the time of 
infection do not propagate the contagion. 



analyzed to extract its time- varying structure, e.g., by 
converting the networks into a scalar time series of some 
network statistic, or computing statistics over the entire 
sequence. A notable exception to this emphasis on dis- 
crete time-series analysis of dynamic behavior is the work 
of Kempe, Kleinberg and Kumar |2f| who consider the- 
oretical questions related to processes on networks with 
edges that vary in continuous time. 

In this paper, we study the structural patterns of a dy- 
namic social network of the sort considered by Kempe, 
Kleinberg and Kumar, that is, a network whose edges ef- 
fectively vary continuously in time. Wed draw our data 
from the time- varying physical proximity of 115 individ- 
uals over the course of one month in the MIT Reality 
Mining study [26j . Here, we consider several questions 
about the temporal connectivity patterns in this data. 
First, we show that the persistence of proximity in this 
data set appears to consistently follow a heavy-tailed dis- 
tribution - possibly having log-normal form - an observa- 
tion that is, to our knowledge, novel. We then consider to 
what degree these proximity networks exhibit periodicity, 
driven by the daily and weekly rhythms of human social 
behavior. Finally, we consider the question of whether 
there is a natural length of time over which to aggregate 
edges such we preserve the important temporal variation, 
while smoothing out some of the high-frequency variation 
in the data. 

Network Statistics 

Before we begin our analysis, we will first briefly review 
a few commonly used network metrics for static topolo- 
gies [l[, and their extension to dynamic topologies. We 
then introduce a similarity statistic to measure the de- 
gree of topological overlap between two networks, e.g., 
between two sequential snapshots. 

Recall that the adjacency matrix A of a simple graph 
is an n x n (0, l)-matrix with a zero diagonal, where n 
is the number of nodes in the network, and an element 
Aij = 1 if and only if vertices i and j are connected, and 
Aij = otherwise. Physical proximity is inherently an 
undirected quantity and thus our adjacency matrices are 
symmetric. The degree A: of a vertex i is thus simply the 
row-sum Ylj=i Aij- 

Our proximity data appear as tuples of the form 
(i,j,ti,t2), denoting that i and j are proximate to each 
other starting at time t\ and ending at time ti, where t\ 
and ti are effectively real- valued numbers [27j |. In order 
to transform this information into a sequence of T net- 
work snapshots A = A^\ A^ 2 \ . . . , A^- T \ we must first 
choose a length of time A, which we call the snapshot 
rate, that each snapshot covers. We then simply say that 
A-j — 1 if and only if the nodes i and j are proximate at 
any time between t and t + A, and Aij = 1 otherwise. A 
natural question, which we will consider shortly, is what 
effect the free parameter A has on the observed patterns 
in dynamic network topology. When both A and the den- 
sity of nodes in physical space are small, these snapshots 
will necessarily be very sparse, each containing only a few 
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FIG. 2: The empirical distribution of the persistence of edges 
in the network on doubly logarithmic axes, for two week dur- 
ing and the full month of October for the core 66 subjects. 
Clearly, the network topology is evolving over a broad range 
of time-scales, with both fast (less than 10 minute) and slow 
(greater than 100 minute) variations. Surprisingly, the distri- 
butions appear to be roughly identical except for the upper 
tail. 



edges. In this sense, these snapshots differ markedly from 
most other social networks in which the average degree 
(k) > 1 and a large connected component exists. 

In addition to calculating two standard statistical mea- 
sures of network structure - the average degree (k) of a 
node and the local density of triangles C (also called the 
clustering coefficient (28|) - over A, we also calculate a 
similarity measure of a vertex's connectivity at one snap- 
shot to the next, which we call the adjacency correlation 
7- 

Given a pair of adjacency matrices A^ and A^ y \ one 
definition of the similarity of the connectivity for a vertex 
j at time x and time y would be the Pearson correlation 
coefficient on the jth row vector at time x and at time 
y. However, in a sparse graph, most entries in each row 
are zero and our measure of similarity would be uninfor- 
matively large. A more useful measure would compute 
the correlation between the row elements that are non- 
zero at least once in either of the two matrices; letting 
N(J) denote this union, the adjacency correlation for j 
becomes 



When the adjacency vectors contain no edges, the ad- 
jacency correlation is undefined, so for convenience, we 
take 7j = 1 in this case. Averaging 7 over all vertices 
yields a statistic that represents the average topological 
overlap of the neighbor set between two snapshots. 
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FIG. 3: The (a) mean degree (k), (b) mean clustering coefficient C and (c) the network adjacency correlation 1 — 7 as a function 
of time for snapshot rates A = {1440, 480, 240, 60, 15, 5} (minutes) during the week of 11 October through 17 October for the 
core 66 subjects. As A grows, under-sampling clearly averages out higher frequency fluctuations. 



Network Analysis 

We now turn to the question of what impact altering 
the snapshot rate A has on the observed dynamic topol- 
ogy has on the aforementioned structural statistics. 

The Reality Mining study followed 115 subjects at 
MIT carry proximity-aware mobile phones for roughly 
nine months, collecting over 500 000 hours of proximity 
data [26| . At five minute intervals, each phone completes 
a Bluetooth device discovery scan over its local area (7- 
10m diameter) and records the identities of all discovered 
devices, including the other subjects' phones. From this 
data, we first extract adjacencies for each scan, and then 
infer ongoing proximity so as to annotate each edge with 
an initiation and termination time. Here, we restrict our 
analysis to the 24 092 undirected subject-subject adja- 
cencies between 01 October 2004 and 31 October 2004 of 
the 66 subjects who work in the same building at MIT. 

From the starting and ending times of each edge, 
we compute its temporal duration, or persistence, as 
<2 —t\. Figure [5] shows the empirical distribution func- 
tion for these durations (in minutes) for both two weeks 
within and the full month of October. Unsurprisingly, 
the weekly and monthly distributions are largely similar 
with minor variations in the upper tail (x > 400 min- 
utes), suggesting a system largely in equilibrium, or with 
strongly regular features. The average persistence of an 
adjacency is relatively small ({x) = 22.83 minutes) while 
there are several (three) edges that persist for more than 
1440 minutes (24 hours). The large variations in persis- 
tence demonstrate that the network's topology evolves at 
a broad range of time-scales. We conjecture that much 
of this regularity is driven by the strong periodicities in 
human behavior, e.g., the home- work- home daily cycle, 
and the work-home-work weekly cycle. 

Now we turn to the question of what impact the snap- 
shot rate A may have on the observed network dynam- 
ics. We note that choosing a A larger than the natural 
time-scale of the topological variation of the network will 
naturally cause high-frequency variations to be averaged 
out, potentially obscuring important variation in order- 



ing such that depicted in Figure Q] In most empirical 
studies of dynamic networks, the snapshot rate is often 
determined by the method of data collection, or chosen 
to guarantee a certain density of edges in each snapshot; 
here, we use our highly temporally resolved data to ex- 
plore the question of what kind of artifacts can be in- 
troduced by an incorrect choice of A, and whether there 
might be a natural choice for proximity networks. 

Choosing snapshot rates A = {1440, 480, 240, 60, 15, 5} 
minutes, we compute the mean degree (A;) (Fig. |3Ji), the 
clustering coefficient C (Fig. |3}d) and the network adja- 
cency correlation statistic 1— 7 (Fig. [3]:) for each resulting 
snapshot time series A fir the seven days beginning on 11 
October. The shortest rate A = 5 minutes exhibits high 
frequency noise overlaid on clear low frequency structure 
from the daily work cycle. As we would expect, as the 
snapshot rate increases, these high frequency variations 
are averaged out. When A = 1440 minutes (1 day), each 
day appears largely the same, with slight differences be- 
tween week and weekend days; applying our analysis to 
the entire month of October shows that this regularity 
remains consistent over this longer period of time (data 
not shown). 

As an aside, we consider the effect that using an in- 
creasingly large snapshot rate A has on the measured 
network statistics. Lengthening A has the effect of in- 
creasing the density of edges in each snapshot; thus, we 
would expect the mean degree and the clustering coeffi- 
cient to increase with A, while we would expect the net- 
work adjacency correlation to decrease. Figure 2] shows 
these statistics averaged over the set of snapshots derived 
from October, as a function of A. The essentially mono- 
tonic growth of these curves illustrates that the choice of 
snapshot rate completely determines the measured value 
of the network statistics. Although the shape of these 
curves certainly conveys some information about how to 
choose A, ultimately we would like a more principled way 
to make that choice. 

Figure [5] shows the autocorrelation function of the time 
series in Figure [3] for A = 5 minutes, the smallest snap- 
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FIG. 4: The values of each network metric (mean degree 
{k)/n, mean clustering coefficient C and mean adjacency cor- 
relation I — 7) during the month of October as a function of 
snapshot rate A; clearly, the value of each metric is propor- 
tional to the choice of A. 



FIG. 5: The autocorrelation of three network metrics for A = 
5 minutes during the week 11 October through 17 October. 
The correlations fall to zero at A = {6.08, 5.25, 6.25} hours 
respectively. 



shot rate. This spectral analysis shows a strong - and 
expected - daily periodic behavior, but also that the au- 
tocorrelation for each statistic drops to zero at roughly 
the same time, being tacf=o = {6.08, 5.25, 6.25} hours, 
respectively. Figure [5] shows the power spectra for the 
same time series, with strong peaks in both the mean 
degree and clustering coefficient spectra at roughly 1, 2 
and 3 times a day, and at 1 and 2 times a day for the 
adjacency correlation. Thus, we suggest that the natural 
snapshot rate for dynamic proximity data is one half of 
the highest of these frequencies, or A nat = 4.08 hours. 
Choosing this value gives a cross section of the curves on 
Figure HI and we report that the natural average degree 
(fc)nat = 2.24, the natural average clustering coefficient 
is Cnat = 0.084 and the natural average adjacency corre- 
lation is 7„ at = 0.88. 

Discussion 

Much of the work to-date on analyzing dynamic com- 
plex networks has focused on constructing a sequence of 
network snapshots, in which edges that may be varying 
rapidly in time are accumulated over a window of fixed 
length. In this manner, researchers have begun to move 
beyond static topology and consider questions related to 
how the behavior of processes changes when temporal or- 
dering is taken into account [25J or how entities such as 
a social group or community may change over time [23| . 
However, even the snapshot approach to representing in- 
herently continuous topological variation poses its own 
set of problems, as the choice of snapshot rate A effec- 
tively determines many of the statistical properties of 
the resulting networks. As such, we make a cautionary 
note about the snapshot approach for studying dynami- 
cal social systems, as incorrectly choosing A may impose 
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FIG. 6: The power spectra of three metric time series at A = 5 
minutes over the course of the month of October. The prin- 
ciple peaks are at A = {24, 12} hours, with additional minor 
peaks at A = 8 hours for the mean degree and clustering 
coefficient. 



a strong bias on the resulting analysis and conclusions. 
To resolve this problem, here we suggest that there may 
be a natural snapshot rate for such dynamical systems 
that appropriately smoothes out high frequency varia- 
tion while preserving important low frequency structural 
patterns. 

The natural rate that we calculate for the dynamic 
proximity network A nat = 4.08 hours is presumably 
closely related to both the length of the day and the 
length of the human work day. Thus, the natural rate for 
other dynamical social systems may vary considerably, 
depending on the particular context of the social ties. In 
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our case, this relatively long period may be related to the 
fact that the 66 core subjects of the MIT study worked 
in a modern office building; proximity networks in more 
physically active environments may exhibit a faster nat- 
ural time scale. 

Turning back to the larger question of studying dy- 
namic social networks, the high frequency data of the 
MIT study allow us to observe that the network topology 
is itself evolving over a broad range of time scales. Al- 
though we have no clear distributional idealization of the 
data shown in Figure [3J it seems unlikely that they are 
power-law distributed |3Cf , or that they otherwise exhibit 
"scale-free" behavior. Rather, we suggest that proximity 
dynamics are a multi-scale phenomenon, with variation 
taking place at several distinct scales, likely driven by 
external periodicities of calendar cycles (e.g., day, week, 
month, season, year, etc.). 

Finally, we suggest that the high frequency proximity 



network data could be used productively in network epi- 
demiology by answering the question of what impact the 
temporal ordering of social contacts [25[ has on the ul- 
timate transmissibility of a disease, and thus the likely 
size of an epidemic. The advantage of this proximity 
data over more traditional approaches to network epi- 
demiology is that it both removes any possibility for a 
self-reporting bias by directly and automatically observ- 
ing human behavior, and directly records the network of 
who is close to whom, which is the network over which 
many (but not all) pathogens spread. 
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